Invasive Tightly Coupled Processor Arrays by VAHID LARI

Invasive Tightly Coupled Processor Arrays by VAHID LARI

Author:VAHID LARI
Language: eng
Format: epub
Publisher: Springer Singapore, Singapore


2.8 Related Work

The use of coarse-grained reconfigurable arrays for data-intensive computations has received a significant research interest due to their superiority in terms of power consumption and performance over the general-purpose processors. As explained in Sect. 1.​2, such architectures offer high power efficiency while at the same time gaining orders of magnitude performance improvement for loop executions when compared with GPPs. Hartenstein [50] classifies CGRA architectures based on their interconnection structures, namely as mesh, linear, or crossbar architectures. Examples of mesh-based CGRAs are the KressArray [51], RAW [52] and the ADRES architecture [53]. RaPiD [54] and PipeRench [55] consist of a linear array of PEs and PADDI-2 [56] as well as Pleiades [57] are classified in the crossbar types. The connectivity has been given a higher flexibility in HoneyComb [58]. This CGRA offers an array of hexagonal geometrical shaped cells, where each cell is directly connected to six neighbours through reconfigurable bidirectional links. Through the use of such an interconnect structure, reachability and communication latency between cells are improved at the cost of higher routing overhead. However, the use of CGRAs expose challenges in front of system designers, the compilation flow for these architectures are complex—compared to GPPs—and as CGRAs are only able to execute loops, they need to be coupled to other cores on which all other parts of programs are executed. This coupling introduces run-time and design-time overheads.

Concerning compilation approaches for nested loops, there has been a significant amount of work in the literature. One of the commonly referred approaches is loop tiling [59–61], which aims to employ transformations in order to split loop iterations into exactly as many congruent sets of computations (tiles) as available processors. Examples of such tiling mechanisms may be found in [30, 31, 62–65] that basically generate the codes for fixed tile sizes and, hence, are inflexible for varying number of available resources. However, this contradicts with the run-time adaptation nature that is required by nowadays programming models. Therefore, there has been a attention on symbolic loop tiling [66, 67]. This initial work has been followed by a breakthrough solution for symbolic loop tiling on CGRAs that has been proposed by Teich et al. in [32, 68], in which a two step approach for parameterised (symbolic) tiling and symbolic scheduling to statically determine symbolic latency-optimal schedules are proposed. First the loop iterations are tiled symbolically into orthotopes of parameterised extensions. Then, the tiled programs are scheduled symbolically on a processor array of unknown (symbolic) size. In simple words, the generated code is adaptive to the number of resources that are available on a CGRA at run time, e.g. invasion time, without the need of run-time re-compilation.

Utilisation tracking adds to the run-time overheads when coupling a reconfigurable architecture such as a CGRA to the other processors. There is a little work that deals with run-time application mapping on CGRAs. The MORPHEUS project [69] aims to develop new heterogeneous reconfigurable SoC with various types of reconfiguration granularity. Resano and others [70] developed a hybrid design/run-time



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.